Fundamentals Formal Foundations and Semantics of Data Extraction

نویسندگان

  • Robert Baumgartner
  • Wolfgang Gatterbauer
  • Georg Gottlob
چکیده

SYNONYMS web data extraction toolkit, web information extraction system, wrapper generator, wrapper generator toolkit, web macros, web scraper. DEFINITION A web data extraction system is a software system that automatically and repeatedly extracts data from web pages with changing content and delivers the extracted data to a database or some other application. The task of web data extraction performed by such a system is usually divided into five different functions: (1) web interaction, which comprises mainly the navigation to usually predetermined target web pages containing the desired information; (2) support for wrapper generation and execution, where a wrapper is a program that identifies the desired data on target pages, extracts the data and transforms it into a structured format; (3) scheduling, which allows repeated application of previously generated wrappers to their respective target pages; (4) data transformation, which includes filtering, transforming, refining, and integrating data extracted from one or more sources and structuring the result according to a desired output format (usually XML or relational tables); and (5) delivering the resulting structured data to external applications such as database management systems, data warehouses, business software systems, content management systems, decision support systems, RSS publishers, email servers, or SMS servers. Alternatively, the output can be used to generate new web services out of existing and continually changing web sources.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Formal Foundations of General System Modeling

We present an approach to the definition of an object-oriented modeling paradigm done in the scope of general system modeling. The paradigm includes a formally defined metamodel and its supporting philosophical and natural science foundations. The metamodel exhibits its internal consistency, supported by Russell’s theory of types, and its consistency in interpretation of subjects of modeling, s...

متن کامل

Towards Formal Foundations of Event Queries and Rules

The field of complex event processing still lacks formal foundations. In particular, event queries require both declarative and operational semantics. We put forward for discussion a proposal towards formal foundations of event queries that aims at making well-known results from database queries applicable to event queries. Declarative semantics of event queries and rules are given as a model t...

متن کامل

Loose Semantics for Uml/ocl

This paper deals with formal foundations for a subset of the UML notation (subset of class diagrams and constraints in OCL). There are already various proposals for semantics of UML and a few for OCL. Nevertheless, it is argued that these approaches are not fully adequate for building a conceptual bridge between the programming artifacts produced from UML/OCL and the formal semantics. A differe...

متن کامل

Fundamentals and Pragmatics of an Entity-Relationship Approach

ii Preface Studying modern database languages one recognizes that there is a gap between language features and theoretical foundations: Studies of the formal foundations exist for the relational data model but not for the Entity-Relationship model, which is a model used by numerous practical people. Also, most extensions of the Entity-Relationship model and other semantic data models lack a pre...

متن کامل

Spatial Role Labeling Annotation Scheme

Given the large body of the past research on various aspects of spatial information, the main obstacles for employing machine learning for extraction of this type of information from natural language have been: a) the lack of an agreement on a unique semantic model for spatial information; b) the diversity of formal spatial representation models ; c) the gap between the expressiveness of natura...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008